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Abstract 

We consider the problem of recovering polynomials that are sparse with respect 
to the basis of Legendre polynomials from a small number of random samples. In 
particular, we show that a Legendre s-sparse polynomial of maximal degree N can be 
recovered from m ^ s log^(7V) random samples that are chosen independently according 
to the Chebyshev probability measure dv{x) = 7r~^(l — x^)~^/'^dx. As an efficient 
recovery method, ^i-minimization can be used. We establish these results by verifying 
the restricted isometry property of a preconditioned random Legendre matrix. We then 
extend these results to a large class of orthogonal polynomial systems, including the 
Jacobi polynomials, of which the Legendre polynomials are a special case. Finally, we 
transpose these results into the setting of approximate recovery for functions in certain 
infinite-dimensional function spaces. 
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1 Introduction 

Compressive sensing has triggered significant research activity in recent years. Its central 
motif is that sparse signals can be recovered from what was previously believed to be highly 
incomplete information [lOl |2D]. In particular, it is now known [ini ISl |35l EQl [311 [32] 
that an s-sparse trigonometric polynomial of maximal degree A'^ can be recovered from 
m X slog^(A^) sampling points. These m samples can be chosen as a random subset from 
the discrete set {j /N}^~q [13 [131 [3S] , or independently from the uniform measure on [0, 1], 
see I3U1[3I1[32]. 
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Until now, all sparse recovery results of this type required that the underlying basis be 
uniformly bounded like the trigonometric system, so as to be incoherent with point samples 
As the main contribution of this paper, we show that this condition may be relaxed, 
obtaining comparable sparse recovery results for any basis that is bounded by a square- 
integrable envelope function. As a special case, we focus on the Legendre system over the 
domain [—1, 1]. To account for the blow-up of the Legendre system near the endpoints of 
its domain, the random sampling points are drawn according to the Chebyshev probability 
measure. This aligns with classical results on Lagrange interpolation which support the 
intuition that Chebyshev points are much better suited for the recovery of polynomials 
than uniform points are [8j. 

In order to deduce our main results we establish the restricted isometry property (RIP) 
for a preconditioned version of the matrix whose entries are the Legendre polynomials 
evaluated at sample points chosen from the Chebyshev measure. The concept of precon- 
ditioning seems to be new in the context of compressive sensing, although it has appeared 
within the larger scope of sparse approximation in a different context in [36] . It is likely 
that the idea of preconditioning can be exploited in other situations of interest as well. 

Sparse expansions of multivariate polynomials in terms of tensor products of Legendre 
polynomials recently appeared in the problem of numerically solving stochastic or para- 
metric PDEs [ini [3]. Our results indeed extend easily to tensor products of Legendre 
polynomials, and the application of our techniques in this context of numerical solution 
of SPDEs seems very promising. Our results may also be transposed into the setting of 
function approximation. In particular, we show that the aforementioned sampling and re- 
construction procedure is guaranteed to produce near-optimal approximations to functions 
in infinite-dimensional spaces of functions having £p-summable Fourier-Legendre coeffi- 
cients (0 < p < 1), provided that the maximal polynomial degree in the ^i-reconstruction 
procedure is fixed appropriately in terms of the sparsity level. 

Our original motivation for this work was the recovery of sparse spherical harmonic 
expansions [3| from randomly located samples on the sphere. While our preliminary results 
in this context seem to be only suboptimal [M], the results in the present paper apply at 
least to the recovery of functions on the sphere that are invariant under rotations of the 
sphere around a fixed axis. Sparse spherical harmonic expansions were recently exploited 
with good numerical success in the spherical inpainting problem for the cosmic microwave 
background [T] , but so far this problem had lacked a theoretical understanding. 

We note that the Legendre polynomial transform has fast algorithms for matrix vector 
multiplication; see for instance [281 EH (13 [291 EH] ■ This fact is of crucial importance in 
numerical algorithms used for reconstructing the original function from its sample values 
- especially when the dimension of the problem gets large. 

Our results extend to any polynomial system which is orthogonal with respect to a 
finitely-supported weight function satisfying a mild continuity condition; this includes the 
Jacobi polynomials, of which the Legendre polynomials are a special case. It turns out 
that the Chebyshev measure is universal for this rich class of orthogonal polynomials, in 
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the sense that our corresponding result requires the random sampling points to be drawn 
according to the Chebyshev measure, independent of the particular weight function. 

Our paper is structured as follows: Section 2 contains the main results for recovery 
of Legendre-sparse polynomials. Section 3 illustrates these results with numerical exper- 
iments. In Section 4 we recall known theorems on £i -minimization and in Section 5, we 
prove the results presented in Section 2. Section 6 extends the results to a rich class of 
orthogonal polynomial systems, including the Jacobi polynomials, while Section 7 contains 
our main result on the recovery of continuous functions that are well approximated by 
Legendre-sparse polynomials. 

Notation. Let us briefly introduce some helpful notation. The £p-norm on is defined 

as ||z||p = (^"^jLilzjl^^ ,1 < p < oo, and ||2:||oo = niaXj=i^...^Ar as usual. The "£o- 
norm", ||z||o = #{j '■ Zj ^ 0}, counts the number of non-zero entries of z. A vector z is 
called s-sparse if ||z||o < s, and the error of best s-term approximation of a vector z G 
in £j) is defined as 



Clearly, as{z)p = if z is s-sparse. Informally, z is called compressible if crs{z)i decays 
quickly as s increases. A result due to Stechkin, see e.g. [2.1;! Lemma 3.1], states that, for 



thus, vectors x e Bj^ = {x e M^, \\x\\q < 1} for < g < 1 can be considered a good model 
for compressible signals. 

For N £ N, we use the notation [N] = {1, . . . , A^}. In this article, C > will always 
denote a universal constant that might be different in each occurence. 

The Chebyshev probability measure (also referred to as arcsine distribution) on [—1,1] 
is given by dv{x) = 7r^^(l — x'^)~'^/'^dx. If a random variable X is uniformly distributed 
on [0, vr], then the random variable Y = cos A is distributed according to the Chebyshev 
measure. 

2 Recovery of Legendre-sparse polynomials from a few sam- 



Consider the problem of recovering a polynomial g from m sample values g{xi), . . . , g{xm)- 
If the number of sampling points is less than or equal to the degree of g, such reconstruction 
is impossible in general due to dimension reasons. Therefore, as usual in the compressive 
sensing literature, we make a sparsity assumption. In order to introduce a suitable notion of 
sparsity we consider the basis of Legendre polynomials L„ on [—1,1], normalized so as to be 
orthonormal with respect to the uniform measure on [—1, 1], i.e. ^ f\ Ln{x)L£{x)dx = 5^,1- 



as{z)p= inf 

y-\\y\\o<s 



q < p. 





pies 
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An arbitrary real-valued polynomial g of degree — 1 can be expanded in terms of 
Legendre polynomials 

N-l 

9{x) = ^CnLn{x), (2) 

n=0 

If the coefficient vector c S is s-sparse, we call the corresponding polynomial Legendre 
s-sparse, or simply Legendre-sparse. If (Ts(c)i decays quickly as s increases, then g is called 
Legendre-compressible . 

We aim to reconstruct Legendre-sparse polynomials, and more generally Legendre- 
compressible polynomials, of maximum degree — 1 from m samples g{xi), . . . ,g{xm), 
where m is desired to be small - at least smaller than A. Writing g in the form ^ this 
task clearly amounts to reconstructing the coefficient vector c G M^. 

To the set of m sample points (xi, . . . , Xm) we associate the m x N Legendre matrix 
$ defined component-wise by 

^j,k = Lk-i{xj), j e[m], k€ [A]. (3) 

Note that the samples yj = g{xj) may be expressed concisely in terms of the coefficient 
vector c G according to 

y = <I>c. 

Reconstructing c from the vector y amounts to solving this system of linear equations. As 
we are interested in the underdetermined case m < N, this system typically has infinitely 
many solutions, and our task is to single out the original sparse c. The obvious but 
naive approach for doing this is by solving for the sparsest solution that agrees with the 
measurements , 

min llzllo subject to = y. (4) 

Unfortunately, this problem is NP-hard in general [18^ [2] . To overcome this computational 
bottleneck the compressive sensing literature has suggested various tractable alternatives 
[25t [TOl [38] . most notably ^i-minimization (basis pursuit) [131 ttOl EO], on which we focus 
in this paper. Nevertheless, it follows from our findings that greedy algorithms such as 
CoSaMP [38] or Iterative Hard Thresholding [7] may also be used for reconstruction. 

Our main result is that any Legendre s-sparse polynomial may be recovered efficiently 
from a number of samples m x s log^(s) log(A). Note that at least up to the logarithmic 
factors, this rate is optimal. Also the condition on m is implied by the simpler one m x 
slog^A Reconstruction is also robust: any polynomial may be recovered efficiently to 
within a factor of its best approximation by a Legendre s-sparse polynomial, and, if the 
measurements are corrupted by noise, g{xi) + ryi, . . . , g{xm) + r]m, to within an additional 
factor of the noise level e = ||7?||oo- We have 



4 



Theorem 2.1. Let N,m, s G N be given such that 

m > Cs log^(s)log(iV). 

Suppose that m sampling points o-re drawn independently at random from 

the Chebyshev measure, and consider the m x N Legendre matrix ^ with entries ^j^k = 
Lk-i{xj), and the m x m diagonal matrix A with entries ajj = {tt/2)^^'^(1 — x^)^/^. Then 
with probability exceeding 1 — A'^~')'l°s^(*) the following holds for all polynomials g{x) = 
'Ylik=o '^kLk{x). Suppose that noisy sample values y = {g{xi) + rji, . . . , g{xm) + ??m) = 
$c + ?? are observed, and \\Ari\\oo < e. Then the coefficient vector c = (cq, ci, . . . , cn-i) is 
recoverable to within a factor of its best s-term approximation error and to a factor of the 
noise level by solving the inequality- constrained ii-minimization problem 

= arg mill ||z||i subject to \\A^z — Ay\\2 < \frne. (5) 

Precisely, 

II #11 , C\Gs(c)\ , . 

c - 2 < 1= h C2e, (6) 

V s 

and 

\\c-c*\\i<Dias{c)i + D2^se. (7) 
The constants C,Ci,C2, Di, D2, and 'j are universal. 

Remark 2.2. (a) In the noiseless {e = 0) and exactly s-sparse case {as{x)i = 0), the 
above theorem implies exact recovery via 

= arg min ||z||i subject to = y. 

(b) The condition ||^t/||oo < e is satisfied in particular if \\r]\\oo < £■ 

(c) The proposed recovery method ([s]) is noise- aware, in that it requires knowledge of the 
noise level e a priori. One may remove this drawback by using other reconstruction 
algorithms such as CoSaMP [38] or Iterative Hard Thresholding [7J which also achieve 
the reconstruction rates ([G]) and ([T]) under the stated hypotheses, but do not require 
knowledge of e [TJ |38j. Actually, those algorithms always return 2s-sparse vectors 
as approximations, in which case the £i-stability result ([7]) follows immediately from 
([6|, see [61 p. 87] for details. 

3 Numerical Experiments 

Let us illustrate the results of Theorem |2.1[ In Figure 1(a) we plot a polynomial g that is 
5-sparse in Legendre basis and with maximal degree = 80 along with m = 20 sampling 
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points drawn independently from the Chebyshev measure. This polynomial is reconstructed 
exactly from the illustrated sampling points as the solution to the £i-minimization problem 
^ with e = 0. In Figure l{b) we plot the same Legendre-sparse polynomial in solid line, 
but the 20 samples have now been corrupted by zero-mean Gaussian noise yj = g{xj) + r]j. 
Specifically, we take E (|r/jp) = 0.025, so that the expected noise level e ^ 0.16. In 
the same figure, we superimpose in dashed line the polynomial obtained from these noisy 
measurements as the solution of the inequality-constrained £i-minimization problem ^ 
with noise level e = 0.16. 




Figure 1: (a) A Legendre-5-sparse polynomial of maximal degree = 80, and its exact 
reconstruction from 20 samples drawn independently from the Chebyshev distribution, 
(b) The same polynomial (solid line), and its approximate reconstruction from 20 samples 
corrupted with noise (dashed line). 

To be more complete, we plot a phase diagram illustrating, for N = 300, and several 
values of s/m and m/N between and .7, the success rate of .^i-minimization in exactly 
recovering Legendre s-sparse polynomials g{x) = 'Ylik=o ^kLk{x). The results, illustrated 
in Figure 2, show a sharp transition between uniform recovery (in black) and no recovery 
whatsoever (white) . This transition curve is similar to the phase transition curves obtained 
for other compressive sensing matrix ensembles, e.g. the random partial discrete Fourier 
matrix or the Gaussian ensemble. For more details, we refer the reader to p2]. 

4 Sparse recovery via restricted isometry constants 

We prove Theorem |2.1| by showing that the preconditioned Legendre matrix satisfies the 
restricted isometry property (RIP) [l3l[T2]. To begin, let us recall the notion of restricted 
isometry constants for a matrix ^. 

Definition 4.1 (Restricted isometry constants). Let ^ € C™^^. For s < N , the restricted 
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m/N 



Figure 2: Phase diagram illustrating 
the transition between uniform recovery 
(black) and no recovery whatsoever (white) 
of Legendre-sparse polynomials of sparsity 
level s and using m measurements, as s and 
m vary over the range s < m < N = 300. 
In particular, for each pair (s/m,m/N), we 
record the rate of success out of 50 trials of 
-^i-minimization in recovering s-sparse co- 
efficient vectors with random support over 
[N] and with i.i.d. standard Gaussian coef- 
ficients from m measurements distributed 
according to the Chebyshev measure. 



isometry constant 6s associated to ^ is the smallest number 5 for which 

(.l-S)\\cg<\\^c\\l<{l + 6)\\c\\l (8) 
for all s-sparse vectors c G C^. 

Informally, the matrix ^ is said to have the restricted isometry property if 6s is small for 
s reasonably large compared to m. For matrices satisfying the restricted isometry property, 
the following £i-recovery results can be shown l9l [23l [22]. 

Theorem 4.2 (Sparse recovery for RIP-matrices). Let ^' G C™^^. A ssume that its re- 
stricted isometry constant 62s satisfies 

62s < 3/(4 + V6)^ 0.4652. (9) 

Let X G and assume noisy measurements y = + r] are given with ||r/||2 < £• Let 
be the minimizer of 

arg min llzlli subject to \\^z — ylU < e. (10) 



Then 



\\x-x#\\2<C^^^ + C2e, (11) 
and 

\\x - x*\\i < Dias{x)i + D2^/se. (12) 

The constants Ci, Di, C2, -D2 > depend only on 62s- In particular, if x is s-sparse then 
reconstruction is exact, = x. 
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The constant in ^ is the result of several refinements. Candes provided the value 
\/2 — 1 in |9j, Foucart and Lai the value 0.45 in [23j . while the version in ^ was shown 
in [22]. The proof of (11) can be found in [9j. The £i-error bound (12) is straightforward 
from these calculations, but does not seem to appear explicitly in the literature. 

So far, all good constructions of matrices with the restricted isometry property use 
randomness. The RIP constant for a matrix whose entries are (properly normalized) inde- 
pendent and identically distributed Gaussian or Bernoulli random variables satisfies 5s < S 
with probability at least 1 — e~'^i provided 

m>C2{S)slog{N/s); (13) 

see for example ||5l [131 |33l [32] . To be more precise, it can be shown that ci{6) = Ci5'^ and 



C2(<5) = C2S ^. Lower bounds for Gelfand widths of £i-balls show that the bound (13) is 
optimal [^[I5l[2l]. 



If one allows for slightly more measurements than the optimal number (13), the re- 
stricted isometry property also holds for a rich class of structured random matrices; the 
structure of these matrices allows for fast matrix-vector multiplication, which accelerates 
the speed of reconstruction procedures such as ii minimization. A quite general class of 
structured random matrices are those associated to bounded orthonormal systems. This 
concept is introduced in |32|, although it is already contained somewhat implicitly in |13|I35| 
for discrete systems. Let P be a measurable space - for instance, a measurable subset of 
M'' - endowed with a probability measure u. Further, let {ipj, j G [N]}, be an orthonormal 
system of (real or complex-valued) functions on D, i.e., 



V'j(x)V'fc(x)dz^(x) = 6j^k, k,j e [N]. (14) 
If this orthonormal system is uniformly bounded. 



V 



sup llV'jIloo = sup sup\'t{jj{x)\ < K (15) 

for some constant > 1, we call systems {V'j} satisfying this condition hounded orthonor- 
mal systems. 

Theorem 4.3 (RIP for bounded orthonormal systems). Consider the matrix ^ G C™^^ 
with entries 

^l,k=M^E), ie[mlke[N], (16) 

formed by i.i.d. samples X£ drawn from the orthogonalization measure v associated to the 
bounded orthonormal system j S [N]} having uniform bound K > 1 in (15). If 

m > C5~^K^s log^(s) log(A^), (17) 

then with probability at least 1 — N~"'^°^^^'^\ the restricted isometry constant 6s of 
satisfies 5s < S. The constants C, 7 > are universal. 
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We note that condition (17) is stated slightly different in [32J, namely as 



m 



log(m) 



> C6'^K^slog^{s)log{N). 



However, it is easily seen that (17) implies this condition (after possibly adjusting con- 
stants). Note also that (17) is implied by the simpler condition 



An important special case of a bounded orthonormal system is the random partial 
Fourier matrix, which is formed by choosing a random subset of m rows from the N x N 
discrete Fourier matrix. The continuous analog of this system is the matrix associated 
to the trigonometric polynomial basis {x i— )• e^""^, n = 0, . . . , — 1} evaluated at m 
sample points chosen independently from the uniform measure on [0,1]. Note that the 
trigonometric system has corresponding optimal uniform bound K = 1. Another example 
is the matrix associated to the Chebyshev polynomial system evaluated at sample points 
chosen independently from the corresponding orthogonalization measure, the Chebyshev 
measure. In this case, K = \f2. 



5 Proof of Theorem 2.1 



As a first approach towards recovering Legendre-sparse polynomials from random samples, 



one may try to apply Theorem 4.3 directly, selecting the sampling points G ["z]}, 

independently from the normalized Lebesgue measure on [—1,1], the orthogonalization 
measure for the Legendre polynomials. However, as shown in [37j, the L°^-norms of the 
Legendre polynomials grow according to ||-Ln||oo = | -/^ra( l)l = |-^n(— 1)1 = (2n + 1)^/^ 



Applying K = ||LAr-i||oo = (2A — 1)^/^ in Theorem 4.3 produces a required number of 
samples 

m X Nb^'^s log^(s) log(A). 

Of course, this bound is completely useless, because the required number of samples is now 
larger than A - an almost trivial estimate. Therefore, in order to deduce sparse recovery 
results for the Legendre polynomials, we must take a different approach. 

Despite growing unboundedly with increasing degree at the endpoints +1 and —1, an 
important characteristic of the Legendre polynomials is that they are all bounded by the 
same envelope function. The following result |37, Theorem 7.3.3], gives a precise estimate 
for this bound. 

Lemma 5.1. For all n> I and for all x G [— 1, 1], 

(1 - x2) V4| ^^(^) I < 2vr-i/2 + -1 < X < 1; 



2n 

here, the constant 2tt~^/'^ cannot be replaced by a smaller one 
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Proof of Theorem 2.1. In light of Lemma 5.1, we apply a preconditioning technique to 
transform the Legendre polynomial system into a bounded orthonormal system. Consider 
the functions 

Qn{x) = (^2)^2(1 _ x^f'^Ln{x). (18) 

The matrix ^ with entries ^j^n = Qn-i{xj) may be written as ^ = where A is the 
diagonal matrix with entries ajj = {tt /2Y/'^{\ — x^)^/^ as in Theorem 5.1, and $ G ^mxN 
is the Legendre matrix with entries = Ln-i{xj). By Lemma 5.1 , the system {Qn} is 

uniformly bounded on [—1, 1] and satisfies the bound ||Qn||oo < \p + \ < ^/3. Due to the 
orthonormality of the Legendre system with respect to the normalized Lebesgue measure 
on [—1,1], the Qn are orthonormal with respect to the Chebyshev probability measure 



dij{x) = 7r-i(l - x^)-^/^dx on [-1, 1]: 



vr 



-^Qn{x)Qk(.x){l-xY^/^dx 



-1 



Ln{x)Lk{x)dx = 6n,k- 



-1 



Therefore, the {Qn} form a bounded ortho norm al system in the sense of Theorem 
with uniform bound K = \/3. By Theorem 
restricted isometry property with constant o 



4.3 

tEe 

C6~'^s log''(A^). We then apply Theorem 
r]i, g{xm)+'r]m) and observe that \\Ar] 



4.3 



4.2 



the renormalized matrix has 
^ 5 with high probability once m > 
to the noisy samples -^^Ay where y = [g{xi) + 

< £ implies ^^||^77||2 < £■ This gives Theorem 



□ 



6 Universality of the Chebyshev measure 

The Legendre polynomials are orthonormal with respect to the uniform measure on [—1,1]; 
we may instead consider an arbitrary weight function v on [—1,1], and the polynomials 
{pn} that are orthonormal with respect to v. Subject to a mild continuity condition on 
f , a result similar to Lemma 5.1 concerning the uniform growth of pn still holds, and the 
sparse recovery results of Theorem 2.1 extend to this more general scenario. Li all cases, 
the sampling points are chosen according to the Chebyshev measure. 

Let us recall the following general bound, see e.g. Theorem 12.1.4 in Szego |37j . 

Theorem 6.1. Let v he a weight function on [—1,1] and set fv{0) = v{cos6)\sm.{6)\. 
Suppose that fi, satisfies the Lipschitz-Dini condition, that is, 

\fv{e + 6)-f,{e)\<L\log{l/6)r^-\ for alio g[0,27t), 6 >0, (19) 

for some constants L,X > 0. Let {pn,n G No}, be the associated orthonormal polynomial 
system. Then 

{I - xY/^v(xy/^\pn{x)\ < Cy for all n€n,xG [-1,1]. (20) 
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The constant depends only on the weight function v. 



The Lipschitz-Dini condition ( 19 ) is satisfied for a range of Jacobi polynomials p„ = 
p^n'^\ n > 0, a, P > —1/2, which are orthogonal with respect to the weight function 
v{x) = (1 — x)"(l + x)^ . The Legendre polynomials are a special case of the Jacobi 
polynomials corresponding to a = /3 = 0; more generally, the case a = (3 correspond to the 
ultraspherical polynomials. The Chebyshev polynomials are another important special case 
of ultraspherical polynomials, corresponding to parameters a = f3 = —1/2, and Chebyshev 
measure. 



For any orthonormal polynomial system satisfying a bound of the form (20), the fol- 
lowing RIP-estimate applies. 

Theorem 6.2. Consider a positive weight function v on [—1, 1] satisfying the conditions 



of Theorem 6.1, and consider the orthonormal polynomial system {pn\ with respect to the 
probability measure dv{x) = cv{x)dx on [—1,1] where c~^ = J^^v{x)dx. 

Suppose that m sampling points (xi, . . . ,Xm) are drawn independently at random from 
the Chebyshev measure, and consider the m x N composite matrix ^ = A^, where $ 
is the matrix with entries <I>j^„ = pn-i{xj), and A is the diagonal matrix with entries 
Ojj = (c7r)-'^/^(l — x|)-^/^u(a;j)^/^. Assume that 

m>C6-^slog^{s)log{N). (21) 

Then with probability at least 1 — A^^')'l°s''(^) the restricted isometry constant of the composite 
matrix = '^■^^ satisfies 5s < 6. The constant C depends only on v, and the constant 
"y > is universal. 



Proof of Theorem 6.2. Observe that ^ j^n = Qn-i{xj), where 

Qn{x) = (C7r)l/2(1 - X^f'\{xf'^pn{x). 



Following Theorem 6.1, the system {Qn] is uniformly bounded on [—1,1] and satisfies 
the bound [[Qnlloo < (cvr)~-'^/^Ci,; moreover, due to the orthonormality of the polynomials 
{Pn} with respect to the measure dv{x), the {Qn} are orthonormal with respect to the 
Chebyshev measure: 

1 r-l 

■K^^Qn{x)Qk{x){l - x^Y^^'^dx = / cpn{x)pk{x)v{x)dx = 5n,k- (22) 
-1 J -I 

Therefore, the {Qn} form a bounded orthonormal system with associated matrix ^' as in 



rem 



4.3 



Theorem 6.2 formed from samples {xj} drawn from the Chebyshev distribution. Theo- 
implies that the renormalized composite matrix --j^'^ has the restricted isometry 



property as stated. □ 
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Corollary 6.3. Consider an orthonormal polynomial system {pn} associated to a measure 



V satisfying the conditions of Theorem 6.1. Let N,m,s S N satisfy the conditions of 



Theorem 6.2, and consider the matrix ^' = as defined there. 

Then with probability exceeding 1 — N~'^^°^ the following holds for all polynomials 
dix) = Ylk=o CkPk{x). If noisy sample values y = {g{xi) + r/i, . . . , g{xm) + Vm) = $c + t/ 
are observed, and \\'r]\\oo ^ then the coefficient vector c = (cq, ci, . . . , cat-i) is recoverable 
to within a factor of its best s-term approximation error and to a factor of the noise level 
by solving the inequality-constrained £i-minimization problem 



Precisely, 



c* = arg min ||2;||i subject to \\A^z — Ay\\2 < \pme. (23) 

II #11 . Ci(Js(c)i 

||c - c*||2 < 7^ + -Die, 



and 

||c-c#||i < C2CT,(c)i + Z)2\/^e. (24) 
The constants Ci,C2, Di, D2 and j are universal. 



As a byproduct of Theorem |6.2[ we also obtain condition number estimates for pre- 
conditioned orthogonal polynomial matrices that should be of interest on their own, and 



improve on the results in [26]. Theorem 6.2 implies that all submatrices of a preconditioned 
random orthogonal polynomial matrix = -j=A^ G j^mxiV -^^j^]^ most s columns 



are simultaneously well-conditioned, provided (21) holds. If one is only interested in a 



particular subset of s columns, i.e., a particular subset of s orthogonal polynomials, the 



number of measurements in (21) can be reduced to 

m>Cslog{s); (25) 
see Theorem 7.3 in |32^ for more details. 

Stability with respect to the sampling measure. 

The requirement that sampling points xj are drawn from the Chebyshev measure in 
the previous theorems can be relaxed somewhat. In particular, suppose that the sampling 
points Xj are drawn not from the Chebyshev measure, but from a more general probability 
measure du(x) = p{x)dx on [—1, 1] with p{x) > c'(l — x^)"^/^ (and J^^ p{x)dx = 1). Now 
assume a weight function v satisfying the Lipschitz-Dini condition ((l9|) and the associated 



orthonormal polynomials Pn{x) are given. Then, by Theorem 6.1 the functions 



Qn{x) = {C7:f'^p{x)-y\{xf'^pn{x) (26) 

form a bounded orthonormal system with respect to the probability measure cp{x)v{x)dx. 
Therefore, all previous arguments are again applicable. We note, however, that taking 
p{x)dx to be the Chebyshev measure produces the smallest constant K in the boundedness 



condition (15) due to normalization reasons. 
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7 Recovery in infinite-dimensional function spaces 



We can transform the previous results into approximation results on the level of continu- 
ous functions. For simplicity, we restrict the scope of this section to the Lcgcndre basis, 
although all of our results extend to any orthonormal polynomial system with a Lipschitz- 
Dini weight function, as well as to the trigonometric system, for which related results have 
not been worked out yet, either. 

We introduce the following weighted norm on continuous functions in [—1, 1]: 



0O,W •" 

xe[-i,i] 



sup \f{x)\w{x), W{x) = - X^)'^/^. 



Further, we define 



r ] 

(27) 

The above quantity involves the best s-term approximation error of c, as well as the ability 
of Legendre coefficients c G to approximate the given function / in the Loo-norm. In 
some sense, it provides a mixed linear and nonlinear approximation error. The c which 
"balances" both error terms determines crjv,^ (/)oo- The factor scaling the "linear ap- 
proximation part" may seem to lead to non-optimal estimates at first sight, but later on, 
the strategy will actually be to choose N in dependence of s such that (TN,s{f) oo becomes 
of the same order as (Ts(c)i. In any case, we note the (suboptimal) estimate 

<V~s PN,s{f)oo,wj 

where 

N-l 

PN,s{f)oo,w ■= inf 11/ - V CkLk\\oo,w 

ceR^ ,\\c\\o<s 

Our aim is to obtain a good approximation to a continuous function / from m sample 
values, and to compare the approximation error with (TAr,s(/)oo,w We have 

Proposition 7.1. Let N,m,s be given with 

m > Cslog^{s)log{N). 

Then there exist sampling points xi,...,Xm (i.e., chosen i.i.d. from the Chebyshev mea- 
sure) and an efficient reconstruction procedure (i.e., £i -minimization), such that for any 
continuous function f with associated error o']y,s{f)oo,w, the polynomial P of degree at most 
N reconstructed from f{xi), . . . , f{x„i) satisfies 

11/ — -P||oo,M) < C aN^s{f)oo,w 

The constants C,C' > are universal. 
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The quantity cr7v,s(/)oo,w) involves the two numbers N and s. We now describe how N 
can be chosen in dependence on s, reducing the number of parameters to one. We illustrate 
this strategy below in a more concrete situation. To describe the setup we introduce 
analogues of the Wiener algebra in the Legendre polynomial setting. Let c(/) with entries 



1 



'1 

Cfc(/) = ^ / ^ f{x)Lk{x)dx, k £ No, 



2 

denote the vector of Fourier-Legendre coefficients of /. Then we define 

Ap := {/ G C[-l, 1], ||c(/)||p < oo}, < p < 1, 

with quasi-norm H/Hap := ||c(/)||p. The use of the p-norm is motivated by the Stechkin 
estimate Q below, which tells us that elements in £p can be considered compressible. Since 
||-^^fcw||oo < it follows that 

f{x)w{x) = ^ Ck{f)Lk{x)w{x) 

fceNo 

converges uniformly for / G Ai, so that fw G C[— 1,1], and ||/||oo,to < \/3||/|Ui- Since 
ll/IUi < ll/IUp for < p < 1 this holds also for / G Ap, < p < 1. Now we introduce 

(^sif)Ai ■■= iiif , 11/ - y^CfcLfclUi = crs(c(/))i. 

ce^2(No),||c||(j<s ^ 

By Stechkin's estimate ([T]) (which is also valid in infinite dimensions) we have, for < q < 1, 

as{f)A.<s'-'/^f\U,. (28) 

Our goal is to realize this approximation rate for f £ Aq when only sample values of / 
are given. Additionally, the number of samples should be close (up to log-factors) to the 
number s of degrees of freedom of the reconstructed function. Unfortunately, for this task 
we have to at least know roughly a finite set [N] containing the Fourier-Legendre coefficients 
of a good s-sparse approximation of /. In order to deal with this problem, we introduce, 
for a > 0, a weighted Wiener type space Ai^a, containing the functions / G C[— 1, 1] with 
finite norm 

ll/IU.,. := j;(l + A:)-|cfc(/)|. 

One should imagine a <C 1 very small, so that / G Ai^a does not impose a severe restriction 
on /, compared to f £ Aq. Then instead of / G we make the slightly stronger require- 
ment f £ Aq n Ai^a, < q < 1. The next theorem states that under such assumptions. 



the optimal rate (28) can be realized when only a small number of sample values of / are 
available. 
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Theorem 7.2. Let < g < 1, a > 0, and m,s £N be given such that 



m > Ca 



-1 



1 



slog'^(s). 



(29) 



Then there exist sampling points xi, . . . , Xm G 1] (i-G., random Chebyshev points) such 
that for every f G AqD Ai^a o, polynomial P of degree at most N = |'s(i/9-i/2)/""| ggyj 
reconstructed from the sample values f{xi), . . . , f{xm) such that 



■ Pll < 

||oo,ui _ 



-P|U, <c(||/|U,+ 



(30) 



Note that up to log- factors the number of required samples is of the order of the number 
s of degrees of freedom (the sparsity) allowed in the estimate ([T]), and the reconstruction 
error (30) satisfies the same rate. Clearly £i-minimization or greedy alternatives can be 



used for reconstruction. This result may be considered as an extension of the theory of 
compressive sensing to infinite dimensions (although all the key tools are actually finite 
dimensional) . 



7.1 Proof of Proposition 7.1 



Let Popt = '^k=o ^k,optLk denote the polynomial of degree at most N ■ 
vector Copt realizes the approximation error cyN,s{f)oo,w, as defined in (27). The samples 
f{xi), . . . , f{xm) can be seen as noise corrupted samples of Popt-, that is, f{x^) = Poptixe) + 
r]i, and \r]i\w{xe) < ||/— -Popt ||oo,u> ■= s- The preconditioned system re ads then f{xi)w{x £) = 
"^^=0 Ck,optLk{xt)w{xi) + El, with \ei\ < e. According to Theorem 4.3 and Theorem p.lj 
the matrix consisting of entries ^i^k = w{xi)Lk_i{xi) satisfies the RIP with high 

probability, provided the stated condition on the minimal number of samples holds. Due 

minimization (10) to y = [f{xe)w{x^))^^^ 
CoptWi < Cias{copt)i + 



1 whose coefficient 



to Theorem 



4.2 



an application of noise-aware ii 
with e replaced by ^/me yields a coefficient vector c satisfying ||c 



C2\fse. We denote the polynomial corresponding to this coefficient vector by P{x) 
J2k=o CkLk{x). Then 



11/ -^'ll 



— 11/ -fopt||oo,ui ~l~ ll-fopt 
0-iV,s(/) 



< 



(^N,s{f)c 



+ V3\\c 



Copt 1 1 1 



< 



)oo,w 



+ V3 



Cias{copt)i + C2V^\\f - Pc 



opt\\oo,w 



< CaN,s{f)c 



This completes the proof. 

The attentive reader may have noticed that our recovery method, noise-aware 



minimization (10), requires knowledge of ajsf^sif), see also Remark 2.2'c). One may remove 



this drawback by considering CoSaMP |38j or Iterative Hard Thresholding [7J instead. The 
required error estimate in ii follows from the ^2-stability results for these algorithms in 
[3 ESI, as both algorithms produce a 2s-sparse vector, see P, p. 87] for details. 
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7.2 Proof of Theorem [7:2] 



Let f £ AgCi Ai^a with Fourier Legendre coefficients Ck{f)- Let > s be a number to be 
chosen later and introduce the truncated Legendre expansion 



Af-l 



fN{x) = ^ Ck{f)Lk{x), 



k=0 



which has truncated Fourier-Legendre coefficient vector c^^"^ with entries ci^^ = Ck{f) if 



k < N and cl^^ = otherwise. Clearly, ||c(^)||q < ||c(/)||g = ||/|U„- Further note that 



V3 



11/ - /^iioo,«, < 11/ - inWa, = \\c - cW|ii = f2 ^ E (1 + 



k=N 



k=N 



Now we proceed similarly as in the proof of Theorem 7.1 and treat the samples of / as 
perturbed samples of /at, that is fN{xj) = f{xj) + rjj with \rij\w[xj) < \\f — /jvHoo,™ < 



\/3iV-"||/|Ui,,. Then following the same arguments as in the proof of Theorem 7. ll if 

m > Cslog^(s)log(iV), (31) 

we can reconstruct a coefficient vector c from samples /(xi), . . . , f{xm) with support con- 
tained in {0, 1, . . . , — 1} such that 

||cW - aili < Cia,(c(^))i + C2V~s\\f - fN\\oo,n, < ClS^"^/" || / |U, + CsV^iV" 

Here, we applied Stechkin's estimate ([T]). Therefore, 



c||i < ||c ■ 



ll < iv-"ll/IU, . + Cifii-i/^ii/iu, + C2V~sN- 



Now we choose 

jSi = pgl/a(l/g-l/2) 

which yields ^/sN~" < s^~^^'^. With this choice 
1 



(32) 



V3 



11/ - fN\\oo,w < 11/ - InWai 



i<C' 



Aa + 



11, 



4-1/9 



Plugging (p2|) into (31) yields (29), and the proof is finished. 



Remark 7.3. Analogous function approximation results can be derived from Theorem 



6.2 for any orthogonal polynomial basis whose weight function satisfies the conditions 
The associated norm is ||/||t;,oo = ||\/3^"'^''^/u^||oo- For the Chebyshev 
, and the corresponding function approximation results in 



6.1 



of Theorem 
polynomials, 

this case are with respect to the unweighted uniform norm. 
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